Greedy and Relaxed Approximations to Model Selection: A simulation study

نویسندگان

  • Guilherme V. Rocha
  • Bin Yu
چکیده

The Minimum Description Length (MDL) principle is an important tool for retrieving knowledge from data as it embodies the scientific strife for simplicity in describing the relationship among variables. As MDL and other model selection criteria penalize models on their dimensionality, the estimation problem involves a combinatorial search over subsets of predictors and quickly becomes computationally cumbersome. Two approximation frameworks are: convex relaxation and greedy algorithms. In this article, we perform extensive simulations comparing two algorithms for generating candidate models that mimic the best subsets of predictors for given sizes (Forward Stepwise and the Least Absolute Shrinkage and Selection Operator LASSO). From the list of models determined by each method, we consider estimates chosen by two different model selection criteria (AICc and the generalized MDL criterion gMDL). The comparisons are made in terms of their selection and prediction performances. In terms of variable selection, we consider two different metrics. For the number of selection errors, our results suggest that the combination Forward Stepwise+gMDL has a better performance over different sample sizes and sparsity regimes. For the second metric of rate of true positives among the selected variables, LASSO+gMDL seems more appropriate for very small sample sizes, while Forward Stepwise+gMDL has a better performance for sample sizes at least as large as the number of factors being screened. Moreover, we found that, asymptotically, Zhao and Yu’s ((1)) irrepresentibility condition (index) has a larger impact on the selection performance of Lasso than on Forward Stepwise. In what refers to prediction performance, LASSO+AICc results in good predictive models over a wide range of sample sizes and sparsity regimes. Last but not least, these simulation results reveal that one method often can not serve for both selection and prediction purposes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximation and learning by greedy algorithms

We consider the problem of approximating a given element f from a Hilbert space H by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. ...

متن کامل

Using Greedy Clustering Method to Solve Capacitated Location-Routing Problem with Fuzzy Demands

Using Greedy Clustering Method to Solve Capacitated Location-Routing Problem with Fuzzy Demands Abstract In this paper, the capacitated location routing problem with fuzzy demands (CLRP_FD) is considered. In CLRP_FD, facility location problem (FLP) and vehicle routing problem (VRP) are observed simultaneously. Indeed the vehicles and the depots have a predefined capacity to serve the customerst...

متن کامل

Scalable Greedy Feature Selection via Weak Submodularity

Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization. Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high. This is because for each greedy step we need to refit a model or calculate a function using the previously selected choices and the new candidate. Two algorithms that are faste...

متن کامل

Approximate blocking probabilities in loss models with independence and distribution assumptions relaxed

Effective approximations are developed for the blocking probability in a general stationary loss model, where key independence and exponential-distribution assumptions are relaxed, giving special attention to dependence among successive service times, not studied before. The new approximations exploit heavy-traffic limits for the steady-state number of busy servers in the associated infinite-se...

متن کامل

Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model

Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008